Search CORE

4 research outputs found

Tvorba závislostního korpusu pro jorubštinu s využitím paralelních dat

Author: Oluokun Adedayo
Publication venue: Univerzita Karlova, Matematicko-fyzikální fakulta
Publication date: 01/01/2018
Field of study

The goal of this thesis is to create a dependency treebank for Yorùbá, a language with very little pre-existing machine-readable resources. The treebank follows the Universal Dependencies (UD) annotation standard, certain language-specific guidelines for Yorùbá were specified. Known techniques for porting resources from resource-rich languages were tested, in particular projection of annotation across parallel bilingual data. Manual annotation is not the main focus of this thesis; nevertheless, a small portion of the data was verified manually in order to evaluate the annotation quality. Also, a model was trained on the manual annotation using UDPipe.Ústav formální a aplikované lingvistikyInstitute of Formal and Applied LinguisticsFaculty of Mathematics and PhysicsMatematicko-fyzikální fakult

CU Digital Repository

Multi-task dialog act and sentiment recognition on Mastodon

Author: Cerisara Christophe
Jafaritazehjani Somayeh
Le Hoa,
Oluokun Adedayo
Publication venue: HAL CCSD
Publication date: 01/08/2018
Field of study

International audienceBecause of license restrictions, it often becomes impossible to strictly reproduce most research results on Twitter data already a few months after the creation of the corpus. This situation worsened gradually as time passes and tweets become inaccessible. This is a critical issue for reproducible and accountable research on social media. We partly solve this challenge by annotating a new Twitter-like corpus from an alternative large social medium with licenses that are compatible with reproducible experiments: Mastodon. We manually annotate both dialogues and sentiments on this corpus, and train a multi-task hierarchical recurrent network on joint sentiment and dialog act recognition. We experimentally demonstrate that transfer learning may be efficiently achieved between both tasks, and further analyze some specific correlations between sentiments and dialogues on social media. Both the annotated corpus and deep network are released with an open-source license

INRIA a CCSD electronic archive server

Tvorba závislostního korpusu pro jorubštinu s využitím paralelních dat

Author: Oluokun Adedayo
Publication venue
Publication date: 01/01/2018
Field of study

CU Digital Repository

National Repository of Grey Literature

Multi-task dialog act and sentiment recognition on Mastodon

Author: Cerisara Christophe
Jafaritazehjani Somayeh
Le Hoa,
Oluokun Adedayo
Publication venue: HAL CCSD
Publication date: 13/07/2018
Field of study

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server